Using Metatab Resources In Pandas

There are two ways to use Metatab data package resources in Pandas. One is to use the CSV files directly, which is easy to do if the package is published to a repository. However, it is better to use the Metatab module to load the package metadata and create dataframes.

Using CSV Files Directly

The simplest was to use the file in a metatab package is to load it's CSV file directly. You can get the CSV file URL from the data repostory page, such as this page for the ADOD Prevalence data in the San Diego Elder Dementia dataset.

While this is simple and portable, it does not give you the features of Metatab, such as built in schema documentation.


In [3]:
import pandas as pd

df = pd.read_csv('http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/adod-prevalence.csv')

df.head()


Out[3]:
region adod_prevelance_2012 adod_prevelance_2020 adod_prevelance_2030
0 Central San Diego 3193.0 4958 6424
1 Mid-City 3136.0 3698 4227
2 Southeast San Diego 3203.0 4160 4985
3 East 14865.0 18410 21489
4 Alpine 339.0 426 555

Using the Metatab Package

The second way to access a package is to use the metatab package. This method requires installing the metatab python package, but has some important advantages: it gives you direct access to package and dataset documentation. You can load any type of metatab package with the open_package() function, but for the highest performance, you should use the CSV package. Opening CSV package loads only the metadata and the resources you need, while using a ZIP or Excel packackage requires downloading the entire package first.

To find the CSV package in a package that is publiched to a CKAN repository, look for a CSV file with the description of "CSV Package Metadata in Metatab format". For the ADOD package, this file is named sandiegocounty.gov-adod-2012-sra-3.csv.

Opening the package returns a Metatab document object. If you display it in Jupyter, the output cell will display the package documentation.


In [7]:
import metatab
doc = metatab.open_package('http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3.csv')
doc


Out[7]:

San Diego Elder Dementia

sandiegocounty.gov-adod-2012-sra-3

Current (2012) ADOD and General population data along with projections for 2020 and 2030 for San Diego county

Documentation

SD County HHSA Reports None

SD County HHSA ADOD Packet Upates Updates to the SD HHSA ADOD profiles in the county. All data is extracted from this document

Contacts

Origin: County of San Diego Health and Human Services Agency

Creator: Lesie Ray

Wrangler: Rashmi Keshava Iyengar San Diego Regional Data Library

Wrangler: Eric Busboom Civic Knowledge

Resources

  1. adod-prevalence - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/adod-prevalence.csv Table 1. Estimates of Prevalence of Alzheimer's Disease and Other Dementias by Subregional Area, 55 Years and Over, San Diego County, 2012 - 2030

  2. hospital-discharge - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/hospital-discharge.csv Table 2. Number of Emergency Department or Hospital Discharged Patients with Any Mention of Alzheimer's Disease and Other Dementias by Subregional Area, 55 Years and Over, San Diego County, 2012 - 2030

  3. elder-population-2012 - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/elder-population-2012.csv Table 3. 2012 Population by Age Group and Subregional Area, 55 Years and Over, San Diego County

  4. elder-population-2020 - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/elder-population-2020.csv Table 4. 2020 Population Projections by Age Group and Subregional Area, 55 Years and Over, San Diego County

  5. elder-population-2030 - http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/elder-population-2030.csv Table 5. 2030 Population Projections by Age Group and Subregional Area, 55 Years and Over, San Diego County

The .resource() method will return one of the resoruces. Displaying it shows the resoruce documentation.


In [4]:
r = doc.resource('adod-prevalence')
r


Out[4]:

adod-prevalence

http://s3.amazonaws.com/library.metatab.org/sandiegocounty.gov-adod-2012-sra-3/data/adod-prevalence.csv

HeaderTypeDescription
regiontext
adod_prevelance_2012integer
adod_prevelance_2020integer
adod_prevelance_2030integer

Once you have a resource, use the .dataframe() method to get a Pandas dataframe.


In [6]:
df = r.dataframe()
df.head()


Out[6]:
region adod_prevelance_2012 adod_prevelance_2020 adod_prevelance_2030
0 Central San Diego 3193.0 4958 6424
1 Mid-City 3136.0 3698 4227
2 Southeast San Diego 3203.0 4160 4985
3 East 14865.0 18410 21489
4 Alpine 339.0 426 555

In [ ]: